Towards a Lightweight RDMA Para-Virtualization for HPC
نویسندگان
چکیده
Virtualization has gained increasing attention in the recent High Performance Computing (HPC) development. While HPC provides scalability and computing performance, HPC in the cloud benefits in addition from the agility and flexibility that virtualization brings. One of the major challenges of HPC in virtualized environments is RDMA virtualization. Existing implementations of RDMA virtualization focused on supporting VMs running Linux. However, HPC workloads rarely need a full-blown Linux OS. Compared to traditional Linux OS, emerging Library OSes, such as OSv, are becoming popular choices as they provide efficient, portable and lightweight cloud images. To enable virtualized RDMA for lightweight library OSes, drivers and interfaces must be re-designed to accommodate the underlying virtual devices. In this paper we present a novel design, the virtiordma driver for OSv, which aims to provide RDMA paravirtualization for lightweight library OS. We compare this new design with existing implementations for Linux, and analyze the advantages of virtio-rdma’s architecture, its ease of migration to different operating systems, and the potential for performance improvement. We also propose a solution for integrating this para-virtualized driver into HPC platforms, enabling HPC application users to deploy their use cases smoothly in a virtualized HPC environment. ı̈ż£
منابع مشابه
A Smart HPC Interconnect for Clusters of Virtual Machines
In this paper, we present the design of a VM-aware, highperformance cluster interconnect architecture over 10Gbps Ethernet. Our framework provides a direct data path to the NIC for applications that run on VMs, leaving non-critical paths (such as control) to be handled by intermediate virtualization layers. As a result, we are able to multiplex and prioritize network access per VM. We evaluate ...
متن کاملCellule: Lightweight Execution Environment for Accelerator-based Systems
The increasing prevalence of accelerators is changing the high performance computing (HPC) landscape to one in which future platforms will consist of heterogeneous multi-core chips comprised of both general purpose and specialized cores. Coupled with this trend is increased support for virtualization, which can abstract underlying hardware to aid in dynamically managing its use by HPC applicati...
متن کاملContain This, Unleashing Docker for HPC
Containers are a lightweight virtualization method for running multiple isolated Linux systems under a common host operating system. Container-based computing is revolutionizing the way applications are developed and deployed. A new ecosystem has emerged around the Docker platform to enable container based computing. However, this revolution has yet to reach the HPC community. In this paper, we...
متن کاملPortable, high-performance containers for HPC
Building and deploying software on high-end computing systems is a challenging task. High performance applications have to reliably run across multiple platforms and environments, and make use of site-specific resources while resolving complicated software-stack dependencies. Containers are a type of lightweight virtualization technology that attempt to solve this problem by packaging applicati...
متن کاملFault Tolerance for HPC with OpenVZ Virtualization by Lite Migration Toolkit
The reliability of large-scale parallel jobs within a cluster or even across multi-clusters under the Grid or distributed computing environment is a long term issue due to its difficulties involving the monitoring and managing of a large number of compute nodes. To contribute to the issue, a Lite Migration toolkit with fault tolerance feature has been developed by the Distributed Computing Team...
متن کامل